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ABSTRACT 

Two  information  processing  tasks  were  considered  for  inclusion  in  a  test  battery  which  was  being 
developed  for  repeated  measures  investigations  of  adverse  environmental  effects.  The  tests,  adapted 
from  the  Rose  Battery  were:  Baron's  Graphemic  and  Phonemic  Analysis,  and  Posner’s  Letter  Classifica¬ 
tion,  Alternate  forms  of  the  the  tests  were  administered  on  15  consecutive  workdays  (Monday  to 
Friday)  to  21  Navy  enlisted  men.  A  total  of  20  measures  were  taken.  These  scores  were  examined  to 
determine  when  in  practice  they  obtained  unchanging  or  linearly  changing  means,  homogeneous  variances, 
and  constant  (differentially  stable)  intertria!  correlations  of  an  acceptably  high  level  (task  defini¬ 
tion),  In  general,  correlations  of  the  basic  measures  tended  to  become  stable  with  sufficient  prac¬ 
tice,  but  derived  measures  such  as  difference,  slope,  and  ratio  scores  did  not  attain  stability.  ^  The 
Baron  and  Posner  tasks  had  high  reliabilities  and  were  highly  correlated  with  each  other.  Preliminary 
analysis  indicated  that  both  tests  may  be  measuring  the  same  thing.  Either  the  Baron  Sense/Nonsense 
or  the  Posner  Name  score  may  be  recommended  for  repeated  measures  experimentation. 


INTRODUCTION 

The  integrity  of  mental  functioning,  in¬ 
cluding  the  ability  to  process  information,  is 
of  prime  interest  in  the  assessment  of  human 
performance.  Therefore,  information  processing 
tests  were  considered  for  inclusion  in  the  Per¬ 
formance  Evaluation  Tests  for  Environmental  Re¬ 
search  (PETER)  battery  which  was  being  designed 
to  assess  the  effects  of  environmental  stress 
(Kennedy,  Bittner,  Harbeson,  &  Jones,  1981).  In 
a  series  of  studies.  Rose  and  cohorts,  examined 
information  processing  tasks  for  the  purpose  of 
assembling  them  into  a  battery  to  assess  indivi¬ 
dual  differences.  (Rose  1974;  1978;  Rose  & 
Fernandes,  1977;  Fernandes  X  Rose,  1978),  In 
developing  the  PETER  battery,  these  tasks  were 
borrowed  freely  with  the  only  intended  modifi¬ 
cation  being  to  extend  the  number  of  replica¬ 
tions.  Rose  provided  guidance  in  the  extension 
of  his  battery  for  possible  use  in  PETER. 

Because  many  administrations  are  routine  in 
environmental  stress  studies,  a  repeated  meas¬ 
ures  paradigm  was  adopted.  Subjects  were  tested 
for  approximately  15  minutes  per  day  on  each 
test  for  15  consecutive  work  days.  This  para¬ 
digm  entailed  a  sufficient  number  of  trials  to 
permit  means  to  asymptote  or  become  linearly 
regular,  and  provided  sufficient  data  for  pro¬ 
vocative  tests  of  the  stability  of  variances  and 
correlations  (Bittner  &  Carter,  1981).  Although 
these  requirement  are  generally  recognized  for 
analysis  and  interpretation  of  repeated  measures 
experimentation  (Campbell  &  Stanley,  1963; 
Winer,  1971)  the  authors  know  of  no  other  per¬ 
formance  battery  which  has  been  standardized  ac¬ 
cording  to  these  criteria. 

Previous  reports  have  documented  our  ex¬ 
periences  with  various  other  tasks  from  Rosens 
battery:  Stroop  (Harbeson,  Krause,  Kennedy  & 
Bittner,  1982);  Grammatical  Reasoning  (Carter, 
Kennedy,  A  Bittner,  1981);  Letter  Search,  and 
Critical  Tracking  (Kennedy  et  al.,  Nov.  1981), 


The  present  report  is  concerned  with  two  addi¬ 
tional  tasks  from  the  eight  in  Rose  and  Fernan¬ 
des  (1977);  Graphemic  and  Phonemic  Analysis 
(Baron  1973;  Baron  A  McKillop,  1975);  and  Letter 
Classification  (Posner,  X  Mitchell,  1967). 

These  tests  were  selected  because  they  were 
purported  to  measure  different  information  pro¬ 
cessing  constructs.  Three  other  tests  were 
administered  at  the  same  time.  Lexical  Decision 
Making  and  Semantic  Memory  Retrieval  will  be  in¬ 
cluded  in  a  future  report  (Harbeson,  &  Kennedy, 
in  preparation).  Short-Term  Memory  Scanning  was 
reported  earlier  (Kennedy,  Bittner,  Carter, 
Krause,  Harbeson,  McCafferty,  Pepper,  &  Wiker, 
1981).  The  remaining  tests  in  Rose  and  Fernandes 
were  not  used  because  they  either  had  low  test- 
retest  reliability  or  they  resembled  tests  which 
had  already  been  studied  (Kennedy  et  al.,  July 
1981).  The  purpose  of  this  investigation  was  to 
evaluate  the  suitability  of  information  process¬ 
ing  tasks  for  inclusion  in  a  battery  of  perform¬ 
ance  tasks.  Evaluations  were  aimed  at  statisti¬ 
cal  suitability  of  individual  measures,  and 
their  uniqueness  and  economy  of  use.  Alto¬ 
gether,  the  purpose  was  to  provide  a  basis  for 
including  tasks  in  the  Performance  Evaluation 
Tests  for  Environmental  Research  (PETER)  Bat¬ 
tery, 

METHOD 

Task  Descriptions 

Graphemic  and  Phonemic  Analysis.  This  task 
was  developed  by  Baron  to  study  visual  (graph¬ 
emic  encoding)  versus  articulatory  (auditory 
encoding)  reading  strategies.  Subjects  were 
required  to  judge  whether  phrases  jnade  sense  or 
not  under  three  conditions:  Sense  (our  new 
car).  Homophone  (its  knot  so),  or  Nonsense 
(a  drop  of  ran).  These  were  combined  in  pairs 
to  form  three  basic  conditions,  Sense/Non- 


sense  (SN),  Sense/ Homophone  (SH),  and  Homophone 
/Nonsense  (HN)*  Theoretically,  graphemic  encod¬ 
ers  would  do  better  on  S  phrases  and  acoustic 
encoders  would  do  better  on  H  phrases.  But, 
since  graphemic  encoding  is  faster  with  normal 
readers,  and  is  more  common,  it  would  be  expec¬ 
ted  that  response  times  would  be  least  for  SN, 
and  greatest  for  HN,  There  were  20  phrases  in 
each  condition,  and  the  interstimulus  Interval 
was  approximately  4  seconds.  Following  Rose  and 
Fernandes  (1977),  twelve  variables  were  record¬ 
ed:  response  times  for  each  of  the  phrases  as  a 
function  of  condition  (6);  ratio  of  SH  time  to 
HN;  response  time  for  each  of  the  three  condi¬ 
tions;  percent  of  errors;  and  mean  error  time 
across  conditions. 

Letter  Classification.  Posner  and  Mitchell 
(1967)  used  this  task  to  study  matching  or  re¬ 
cognition  of  stimuli  of  various  levels  of  com¬ 
plexity.  Subjects  were  to  make  same  or  diffe¬ 
rent  judgments  on  pairs  of  letters  based  on 
three  criteria.  Letters  were  classified  by  phy¬ 
sical  appearance  (AA  vs.  AB),  name  identity  (Aa 
vs .  Ab) ,  or  category  (both  vowels  or  consonants 
such  as  AE  or  BC  ,  vs.  not  matched,  such  as  AB). 
There  were  36  trials  per  day  in  each  of  the 
first  two  conditions  and  32  in  the  third.  The 

interstimulus  interval  was  approximately  4 
seconds.  Eight  scores  were  calculated  including 
response  times  for  each  condition  for  same  judg¬ 
ments,  response  times  for  all  different  judg¬ 
ments,  two  difference  scores,  percent  errors  and 
mean  error  time. 

Subjects 

The  subjects  were  21  Navy  enlisted  men  be¬ 
tween  the  ages  of  18  and  24  who  had  volunteered 
for  duty  at  the  Naval  Biodynamics  Laboratory. 
One  subject  was  dropped  from  the  analysis  in 
Graphemic  and  Phonemic  analysis  because  his 
daily  score  sheet  was  lost*  All  subjects  were 
recruited,  evaluated  and  employed  in  accordance 
with  procedures  specified  in  Secretary  of  the 
Navy  Instruction  3900.39  Series  and  Bureau  of 
Medicine  and  Surgery  Instruction  3900.6.  These 
Instructions  are  based  upon  voluntary  consent, 
and  meet  the  provisions  of  prevailing  national 
and  international  guidelines.  For  a  detailed 
description  of  the  subject  selection  procedure, 
signal  see  Thomas,  Majewski,  Ewing,  and  Gilbert 
(1978). 

Apparatus  and  Procedure 

The  stimulus  material  was  presented  by 
means  of  black  and  white  slides  shown  on  a  Kodak 
Ektograph  450  Audio  Viewer  ®.  The  rate  of  pre¬ 
sentation  was  controlled  by  preprogrammed  tape 
cassettes.  Each  trial  was  preceeded  by  a  cueing 
signal  of  two  clicks.  Subjects  responded  by 
pushing  one  of  two  buttons  (yes  or  no)  on  boxes 
which  were  fastened  to  their  desk  tops.  The 
response  time  was  measured  from  the  onset  of  the 


stimuli  to  the  time  the  subject  pushed  his 
answer  button.  The  answers  and  the  response 
times  were  displayed  on  an  automatic  timing 
device  and  recorded  on  an  answer  sheet  by  the 
experimenter.  The  subjects  were  tested  in 
groups  of  four  beginning  at  8:00  AM  for  15  con¬ 
secutive  work-days.  The  five  tests  were  admin¬ 
istered  in  the  same  order  to  each  group  of  sub-^ 
jects,  but  the  order  was  varied  across  groups 
and  days.  There  was  a  break  of  2  or  3  minutes 
between  tests  while  the  experimenter  changed 
carousels  and  cassette  tapes,  and  a  five  minute 
break  between  tests  halfway  through  testing. 
Total  testing  time  was  approximately  an  hour  and 
a  half  including  breaks. 

RESULTS 

Analysis 

Means,  standard  deviations,  and  cross- 
session  correlations  were  calculated  for  each 
measure.  In  an  intitial  sensitivity  screening, 
measures  with  correlations  which  averaged  below 
.50,  or  which  were  obviously  unstable,  were 
dropped  from  further  analysis.  The  reliability 
and  the  stability  of  the  correlations  of  the  re¬ 
maining  measures  were  determined  by  a  general 
computer  program  developed  by  Steiger  (1980)  To 
avoid  problems  with  last  day  effects,  the  15th 
day  was  dropped  from  the  stability  analysis 
(cf..  Carter  et  al.,  1981).  Fmax  (Winer, 
1971)  was  used  to  test  for  the  homogeneity  of 
the  variances.  Graphical  Analysis  was  employed 
to  examine  the  stability  of  the  means.  Within 
each  task  those  measures  which  achieved  stabili¬ 
ty  were  compared.  Correlations  were  calculated 
between  average  scores  for  each  subject  over 
stable  days  on  each  measure. ^  These  values  were 
adjusted  using  the  Spearman-Brown  Prophecy 
Formula  and  the  correction  for  attenuation  to 
estimate  the  between  measure  correlations.  A 
similar  procedure  was  followed  to  compare  meas¬ 
ures  across  tasks. 

Graphemic  and  Phonemic  Analysis 

Preliminary  analysis  was  done  on  12  scores. 
Three  of  these  were  dropped  from  further  an¬ 
alysis,  including  the  SH/HN  Ratio  which  had  a 
reliability  of  essentially  zero.  Table  1  shows 
the  results  of  the  differential  stability  an¬ 
alysis  for  the  remaining  9  scores.  It  can  be 
seen  that  all  but  H(S)  attained  stability, 
H(N)  and  HN  did  not  become  stable  until  rather 
late  in  practice*  Fmax  tests  were  nonsignifi¬ 
cant  for  all  measures.  Figure  1  shows  the  means 
of  SN,  SH,  and  HN.  All  appear  to  become  stable 
by  about  Day  6.  As  was  predicted  in  the  liter¬ 
ature  the  HN  phrases  required  the  most  time  and 
the  SN  phrases  the  least.  However,  SN,  and  SH 
times  became  more  alike  with  practice.  Correla¬ 
tions  between  all  of  the  measures  were  quite 
high.  Those  for  SN,  SH,  and  HN  are  shown  in 
Table  2. 
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MSEC 


TABLE  1.  Graphemic  and  Phonemic  Analysis: 
Differential  Stability  Analysis 


Score 

Stable  Days  r 

x2 

df 

£ 

S(N) 

3-14 

.76 

78.90 

65 

.12 

N(S) 

3-14 

.79 

61.33 

65 

.61 

SN 

3-14 

.84 

78.46 

65 

.12 

S(H) 

4-14 

.78 

61.58 

54 

.22 

H(S) 

— — 

NOT 

STABLE 

— — 

SH 

6-14 

.84 

45.13 

35 

.12 

H(N) 

12-14 

.90 

1.57 

2 

.46 

N(H) 

4-14 

.73 

67.09 

54 

.11 

HN 

10-14 

.88 

14.62 

9 

.10 

FIGURE  1:  Baron,  Task;  Mean  Response  Times  on 
SN,  SH,  and  HN  over  15  days  (N=20). 


TABLE  2.  Graphemic  and  Phonemic  Analysis: 
Differentially  Stabilized  Correlations* 

Score 

SN 

SR 

SN 

(.85) 

.84 

.85 

SH 

1.00 

(.84) 

.86 

HN 

.99 

1.00 

(.88) 

Letter  Classification 

Of  the  8  Posner  scores,  only  the  three 
basic  measures  qualified  for  further  analysis. 
The  results  of  the  differential  stability  analy¬ 
sis  are  shown  in  Table  3.  Considerable  practice 
was  required,  but  all  conditions  eventually  be¬ 
came  stable  and  reliabilities  were  very  respec¬ 
table.  Again,  Fmax  tests  were  non-significant 
for  all  measures.  Examining  Figure  2,  it  can  be 
seen  that  the  means  appear  almost  level  for  the 
Name  and  Physical  conditions  from  about  Day  2, 
and  certainly  for  all  conditions  by  at  least  Day 
6.  The  relationship  between  the  means  is  exact¬ 
ly  as  predicted  in  the  literature.  As  can  be 
seen  in  Table  4,  the  three  measures  were  highly 
correlated. 


TABLE  3.  Posner  Task: 

Differential  Stability  Analysis 

Score 

Stable  Days  r 

x2 

df 

£ 

Physical 

10-14  .81 

10.21 

9 

.33 

Name 

8-14  .83 

25.11 

20 

Category 

12-14  .89 

3.61 

2 

.16 

FIGURE  2:  Posner  Task  mean  response  times  on 
Physical,  Name,  and  Category  over  15  days 
(N=21). 

TSbETT*  Letter  Classification: 
Differentially  Stabilized  Correlations* 


Score 

Physical 

Name 

Category 

Physical 

(.81) 

.82 

.76 

Name 

.99 

(.83) 

.80 

Category 

.90 

.94 

(.89) 

*Correlations  above,  reliabilities  along,  and  corrected-for-attenuation  estimates  below  the  diagonal. 
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Comparison  of  Tasks 

Table  5  shows  the  intercorrelations  of  the 
three  basic  measures  from  each  task  across  dif¬ 
ferentially  stabilized  days*  There  appears  to  be 
a  large  overlap  between  the  tasks*  There  is  no 
notable  difference  between  the  correlations  of 
any  of  the  Baron  scores.  The  distinct  pattern  in 
the  correlations  of  the  Posner  measures  might  be 
explained  by  the  parallel  trend  in  their  intern¬ 
al  reliabilities  (see  Table  4) •  Since  the  Posner 
scores  all  correlate  highly  with  each  other  it 
is  most  likely  that  the  differences  seen  in 
Table  5  are  the  result  of  error  variance  rather 
than  a  difference  in  what  is  being  measured. 


TABLE  5.  Baron  and 
Inter correlation* 

Posner 

Task 

Score 

Physical 

Name 

Category 

SN 

.58 

(.70) 

.66 

(.78) 

.75  (.86) 

SH 

.57 

(.69) 

.66 

(.79) 

.76  (.88) 

HN 

.53 

(.63) 

.62 

(.73) 

.74  (.84) 

*Corrected  for  attenuation  values  in  parenthesis 


DISCUSSION 

Comparison  with  Previous  Research 

The  results  of  the  present  study  are  con¬ 
sistent  with  past  research.  Some  of  the  means 
obtained  in  the  present  study  are  higher  than 
those  obtained  previously  (i.e.  Rose  &  Fernandes 
1977),  but  not  dramatically  so.  Subject  differ¬ 
ences  are  not  surprising  considering  that  the 
other  studies  used  college  students  and  the 
present  study  employed  enlisted  men.  The  impor¬ 
tant  point  is  that  the  same  patterns  were 
obtained  in  all  of  the  studies. 

Comparison  of  Tasks  and  Subtasks 

Two  main  findings  were  revealed  in  the  com¬ 
parison  of  measures  within  tasks.  The  first  was 
that  all  of  the  basic  scores  in  each  test  ap¬ 
peared  to  be  measuring  the  same  thing.  This 
phenomenon  has  been  noted  in  other  studies  in 
this  program  (Harbeson,  et  al.,  1982),  but  cases 
in  which  subtasks  do  not  converge,  even  after 
corisiderable  practice  are  also  in  evidence 
(Bittner,  Lundy,  Kennedy,  &  Harbeson,  1982). 
The  results  also  Indicate  that  there  is  a  large 
overlap  between  the  Baron  and  Posner  tasks. 
Some  overlap  is  to  be  expected,  as  many  of  the 
same  information  processing  operations  are  in¬ 
volved  (Carroll,  1974).  However,  it  is  possible 
that  both  tasks  are  measuring  the  same  underly¬ 
ing  ability.  When  tasks  or  subtasks  are  used  to 
measure  different  constructs  or  abilities,  it  is 
important  to  examine  the  differerential  rela¬ 
tionships.  In  this  study  all  of  the  scores  may 


be  measuring  different  levels  of  the  same  thing. 
Further  research  using  factor  analysis  is  needed 
to  reach  a  more  definitive  conclusion  and  will 
be  included  in  a  future  report  (Harbeson  & 
Kennedy,  in  preparation) . 

The  second  finding  was  that  derived  meas¬ 
ures  such  as  slope,  difference  scores,  and  ratio 
scores  were  either  unstable  or  had  extremely  low 
internal  reliability.  This  has  been  observed  in 
previous  reports  from  this  laboratory  (e.g. 
Carter  &  Krause,  1982;  Kennedy  et  al.,  July 
1981).  The  lack  of  reliability  of  derived  meas¬ 
ures  has  also  been  commented  on  by  other  authors 
(Cronbach  &  Furby,  1970). 

Some  Practical  Considerations 

The  methodology  used  in  this  study  was  ob¬ 
viously  adequate  as  stable  reliable  measures 
were  obtained.  However,  the  administration  time 
was  rather  long  as  compared  to  the  actual  time 
the  subjects  were  exposed  to  the  stimulus  mater¬ 
ial.  An  interactive  computer  implementation 
would  probably  be  more  efficient,  eliminating 
the  time  taken  to  record  responses  and  change 
carousels  and  cassettes.  This  would  also  de¬ 
crease  subject  fatigue.  It  would  be  helpful  to 
have  built  in  time  limits  to  reduce  problems 
with  outliers.  It  might  also  be  possible  to 
produce  numerous  alternate  forms  in  a  manner 
similar  to  that  used  by  Carter  and  Sblsa  (1982). 
With  a  few  changes  in  procedure,  these  tests 
could  be  administered  in  a  paper  and  pencil  for¬ 
mat  to  groups  or  individuals  for  use  in  environ¬ 
ments  where  other  equipment  would  be  impracti¬ 
cal.  Further  standardization  studies  would  be 
required  with  any  changes  in  procedure. 

Conclusions 

Any  of  the  Baron  or  Posner  basic  measures 
would  be  suitable  for  repeated  measures  testing. 
Since  they  appear  to  be  redundant,  at  least 
within  tests,  one  from  each  task  would  be  suf¬ 
ficient.  Baron  SN  and  Posner  Name  have  the  best 
psychometric  qualities.  Alternate  forms  are 
easier  to  construct  for  the  Posner  task.  Future 
research  is  need  to  compare  the  two  tasks  and  to 
determine  the  qualities  of  single  composite  mea¬ 
sures.  At  present,  the  Baron  SN  measure  and  the 
Posner  Name  measure  may  be  recommended  for  in¬ 
clusion  in  a  repeated  measures  test  battery. 
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FOOTNOTE 

^Correlating  averages  was  more  convenient  to 
use  with  this  large  data  set  than  averaging  cor¬ 
relations  which  is  a  more  statistically  effi¬ 
cient  technique  (see  Bittner,  Dunlap,  &  Jones 
(1982). 
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